Automatic Bilingual Legacy-Fonts Identification and Conversion System

نویسندگان

  • Gurpreet Singh Lehal
  • Tejinder Singh Saini
  • Saini Pretpal Kaur Buttar
چکیده

The digital text written in an Indian script is difficult to use as such. This is because, there are a number of font formats available for typing, and these font-formats are not mutually compatible. Gurmukhi alone has more than 225 popular ASCII-based fonts whereas this figure is 180 in case of Devanagari. To read the text written in a particular font, that font is required to be installed on that system. This paper describes a language and font-detection system for Gurmukhi and Devanagari. It also explains a font conversion system for converting the ASCII based text into Unicode. Therefore, the proposed system works in two stages: the first stage suggests a statistical model for automatic language-detection (i.e., Gurmukhi or Devanagari) and fontdetection; the second stage converts the detected text into Unicode as per font detection. Though we could not train our systems for some fonts due to nonavailability of font converters but system and its architecture is open to accept any number of languages/fonts in the future. The existing system supports around 150 popular Gurmukhi font encodings and more than 100 popular Devanagari fonts. We have demonstrated the effectiveness of font detection is 99.6% and Unicode conversion is 100% in all the cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Design of Persian Typefaces

In this paper&#10, a fast method for automatic generation and scientific design of Persian letters is proposed. Scientific typeface design is an approach in which fonts are described by mathematical curves with well-defined parameters, where these parameters can be automatically tuned. METAFONT is a language suitable for the type of design used in this work. This language is particularly useful...

متن کامل

Automatic Design of Persian Typefaces

In this paper , a fast method for automatic generation and scientific design of Persian letters is proposed. Scientific typeface design is an approach in which fonts are described by mathematical curves with well-defined parameters, where these parameters can be automatically tuned. METAFONT is a language suitable for the type of design used in this work. This language is particularly useful i...

متن کامل

An Omni-Font Gurmukhi to Shahmukhi Transliteration System

This paper describes a font independent Gurmukhi-to-Shahmukhi transliteration system. Even though Unicode is gaining popularity, but still there is lot of material in Punjabi, which is available in ASCII based fonts. A problem with ASCII fonts for Punjabi is there is no standardisation of mapping of Punjabi characters and a Gurmukhi character may be internally mapped to different keys in differ...

متن کامل

GFUC: Gurmukhi Font and Unicode Converter

Growth of information technology has played a great role in connecting the world together. The to and fro of information is common in this world. Fonts play a key major role in this communication process in digital domain. Common encoding scheme for one language helps in loss-less digital communication. Indian fonts lacks in this zone, as no Indian font has standard encoding format for mapping ...

متن کامل

Identification of Bilingual Terms from Monolingual Documents for Statistical Machine Translation

The automatic translation of domain-specific documents is often a hard task for generic Statistical Machine Translation (SMT) systems, which are not able to correctly translate the large number of terms encountered in the text. In this paper, we address the problems of automatic identification of bilingual terminology using Wikipedia as a lexical resource, and its integration into an SMT system...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Research in Computing Science

دوره 86  شماره 

صفحات  -

تاریخ انتشار 2014